On dispose des fichiers suivants : - Dataset_User_Agreement.pdf - photos - photos.json
Le Nombre des lignes du fichier "photos.json" est : 200100
{'photo_id': 'zsvj7vloL4L5jhYyPIuVwg',
'business_id': 'Nk-SJhPlDBkAZvfsADtccA',
'caption': 'Nice rock artwork everywhere and craploads of taps.',
'label': 'inside'}
| photo_id | business_id | caption | label | |
|---|---|---|---|---|
| 0 | zsvj7vloL4L5jhYyPIuVwg | Nk-SJhPlDBkAZvfsADtccA | Nice rock artwork everywhere and craploads of ... | inside |
| 1 | HCUdRJHHm_e0OCTlZetGLg | yVZtL5MmrpiivyCIrVkGgA | outside | |
| 2 | vkr8T0scuJmGVvN2HJelEA | _ab50qdWOk0DdB6XOrBitw | oyster shooter | drink |
| 3 | pve7D6NUrafHW3EAORubyw | SZU9c8V2GuREDN5KgyHFJw | Shrimp scampi | food |
| 4 | H52Er-uBg6rNrHcReWTD2w | Gzur0f0XMkrVxIwYJvOt2g | food | |
| ... | ... | ... | ... | ... |
| 200095 | 4Zia9NkAfQNjMfcIDhwJ-g | 2HxkdqHmbYGj_BH1bLaiSw | #Nektar | food |
| 200096 | KB96KRZRhRm8hUkI-OpGEA | _gVyuTRb_6HM-SNtqbpevQ | inside | |
| 200097 | Klmojvaf2_2dP1XKzTsFmQ | NUyEOjfAl3HvkpzSpdwqeA | food | |
| 200098 | FNEiq7Mogec7t31OaU5juw | hE6YsHHV0fCz_UrGS4o6VA | Drinks by the water! | drink |
| 200099 | NHEtLh7APk7Yssjo0h45VA | VIYvcX9SScnqmoI0so1KZA | food |
200100 rows × 4 columns
On dispose d'un dataframe qui contient 200100 photos et 4 variables
Nombre des photos par label :
| label | nb_photos | |
|---|---|---|
| 0 | drink | 15670 |
| 1 | food | 108152 |
| 2 | inside | 56031 |
| 3 | menu | 1678 |
| 4 | outside | 18569 |
Les labels des photos sont : - inside - outside - drink - food - menu
On choisit 200 photos de chaque label, on aura au total 1000 photos
Taile de la photo en mode BGR: Largeur : 399 px Hauteur : 600 px Valeur du pixel situé en (50,10) en mode BGR: [97 90 95]
Taile de la photo: Largeur : 399 px Hauteur : 600 px Valeur du pixel situé en (50,10) : 92
Descripteurs shape : (2998, 128)
Nombre de descripteurs : (1284341, 128)
Nombre de clusters estimés : 1133
MiniBatchKMeans(init_size=3399, n_clusters=1133, random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
MiniBatchKMeans(init_size=3399, n_clusters=1133, random_state=0)
(1000, 1133)
Dimensions dataset avant réduction PCA : (1000, 1133) Dimensions dataset après réduction PCA : (1000, 704)
| tsne1 | tsne2 | class | cluster_estimé | |
|---|---|---|---|---|
| 0 | 1.744463 | -0.679428 | inside | 3 |
| 1 | -2.131394 | -2.400133 | inside | 1 |
| 2 | 0.266613 | -0.870728 | inside | 3 |
| 3 | 1.332294 | -1.648237 | inside | 3 |
| 4 | -3.441514 | -2.701073 | inside | 1 |
ARI : 0.19151298134378453
Nombre des photos par cluster:
| cluster_estimé | class | |
|---|---|---|
| 0 | 0 | 163 |
| 1 | 1 | 259 |
| 2 | 2 | 155 |
| 3 | 3 | 244 |
| 4 | 4 | 179 |
| drink | food | inside | menu | outside | |
|---|---|---|---|---|---|
| drink | 61 | 65 | 31 | 9 | 34 |
| food | 61 | 115 | 11 | 2 | 11 |
| inside | 20 | 51 | 92 | 7 | 30 |
| menu | 1 | 0 | 16 | 135 | 48 |
| outside | 12 | 28 | 94 | 10 | 56 |
precision recall f1-score support
drink 0.39 0.30 0.34 200
food 0.44 0.57 0.50 200
inside 0.38 0.46 0.41 200
menu 0.83 0.68 0.74 200
outside 0.31 0.28 0.30 200
accuracy 0.46 1000
macro avg 0.47 0.46 0.46 1000
weighted avg 0.47 0.46 0.46 1000
Dimensions dataset après réduction PCA : (1000, 25088) Dimensions dataset après réduction PCA : (1000, 922)
| tsne1 | tsne2 | class | cluster_estimé | |
|---|---|---|---|---|
| 0 | 12.090289 | 3.025154 | inside | 3 |
| 1 | -28.332422 | -44.978344 | inside | 2 |
| 2 | 17.779991 | -27.934637 | inside | 0 |
| 3 | 40.507111 | -11.656160 | inside | 0 |
| 4 | 17.088072 | -45.767704 | inside | 2 |
ARI : 0.49055651110189774
| cluster_estimé | class | |
|---|---|---|
| 0 | 0 | 221 |
| 1 | 1 | 131 |
| 2 | 2 | 172 |
| 3 | 3 | 213 |
| 4 | 4 | 263 |
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| drink | 15 | 1 | 6 | 168 | 10 |
| food | 2 | 127 | 4 | 33 | 34 |
| inside | 127 | 2 | 54 | 7 | 10 |
| menu | 0 | 1 | 2 | 0 | 197 |
| outside | 77 | 0 | 106 | 5 | 12 |
| drink | food | inside | menu | outside | |
|---|---|---|---|---|---|
| drink | 168 | 1 | 15 | 10 | 6 |
| food | 33 | 127 | 2 | 34 | 4 |
| inside | 7 | 2 | 127 | 10 | 54 |
| menu | 0 | 1 | 0 | 197 | 2 |
| outside | 5 | 0 | 77 | 12 | 106 |
precision recall f1-score support
drink 0.79 0.84 0.81 200
food 0.97 0.64 0.77 200
inside 0.57 0.64 0.60 200
menu 0.75 0.98 0.85 200
outside 0.62 0.53 0.57 200
accuracy 0.73 1000
macro avg 0.74 0.72 0.72 1000
weighted avg 0.74 0.72 0.72 1000